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ABSTRACT 


A regression mode) is used by the Office of the Secretary 
of Defense (OSD) to predict median rents so as to find variable 
housing allowance (VHA) as a supplement to Basic Allowance for 
Quarters (BAQ). These allowances are made for service members 
in the continental United States. It is this model that is 
reviewed in this thesis. Median rental data taken from the 
annual VHA survey are used to test this model. From this 


analysis, the model indicates lack of fit, invalid assumptions 


_ 


~ 


and perhaps not even a “reasonable” approach. A more sensible 

approach is used to propose two other regression models. 
These models are a Weighted Regression Model which, like 

the current model, predicts medians; and an Analysis of 

Covariance model which predicts or analyzes the mean rent. 

More reasonable predictions of median and mean rent are 

indicated by these two models respectively. 
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THESIS DISCLAIMER 


The reader is cautioned that computer programs developed in 
this research may not have been exercised for all cases of 
interest. While every effort has been made, within the time 
available, to ensure that the programs are free of computa- 
tional and logic errors, they cannot be considered validated. 
Any application of these programs without additional verifica- 


tion is at the risk of the user. 
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I. INTRODUCTION 


A. BACKGROUND 

VHA, Variable Housing Allowance, is a supplement to the 
BAQ, Basic Allowance for Quarters, paid to service members who 
live in private housing in the United States. VHA is designed 
to aid the service member who is assigned to a "high cost area" 
of the United States where the cottan monthly cost of housing 
for a person in the same grade or dependency status exceeds 80% 
of the national median for members in the same rank or 
dependency status [Ref. l:p. 2-1]. VHA is computed from the 


following equation [Ref. l:p. 2-2]: 


VHA = local median housing costs - 80 % of the national (1) 
by paygrade and marital median housing cost 
status by paygrade and 


marital status. 
The law specifies that each member's VHA allowance will be 
determined by the actual housing costs currently paid by the 
service member [Ref. l:p. 2-2]. VHA rates are computed by the 
Per Diem Travel and Transportation Allowance Committee Staff, 
a subset of the Office of the Secretary of Defense (OSD), with 
the aid of the Defense Manpower Data Center, DMDC. The basic 


process by which the rates are computed is as follows: 


1. Distinct areas in which military members reside are 
determined. 
2. Proper sample sizes are determined. 


3. Survey samples of housing costs are taken, edited and 
median rents are computed for each category of paygrade, 
house type, number of bedrooms, and marital status. 
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4. Preliminary VHA rates for each area and dependency status 
are computed by determining an estimated median rent for 
each category using the GPX program which utilizes 
various regression analysis techniques and smoothing 
procedures. (GPX is the name of the model developed by 
OSD.) 

5. Preliminary VHA rates are reviewed to ensure that the 
rates determined by GPX are in line with the cost 
guidelines set by Congress. 

B. CURRENT VHA COMPUTATIONAL PROCESS 

The computation of preliminary VHA rates for each area 
(MHA - military housing area), paygrade, and dependency status 
has developed into an extremely complicated process. Once the 
median rents are computed for each category of house type, 
number of bedrooms, paygrade, and marital status, a count of 
the number of median rents per category is taken [Ref. lip. 2- 
56]. If the number of counts in each category for a particular 
MHA is too small then larger sample sizes are obtained by 
incorporating median rent information from the same category 
from a close, in geographic terms, MHA. [Ref. l:p. 2-58] This 
information, taken from these close MHA's is then weighted. 
The closer, in terms of miles, this MHA is to the original MHA 
the more weight is placed on the information from that MHA. 
{Ref. l:p. 2-59] A new vector of median rents, incorporating 
the information from the geographically close MHAs = and 
dimensioned by the four categories above is calculated. [Ref. 
l:p. 2-59] The underlying reason for finding this vector of 


median rents is to find the underlying relationship between 


the total pay of a military member and the amount of rent a 














military member will pay [Rer. l:p. 2-60]. Let Pik = the total 
pay for a person in the ith paygrade, in the jth dependency 
status who has 'k' number of bedrooms in his or er home and 
an ‘l' type of home. Let Til equal the median rent for 
military members in that same group. Then the current 


regression model in use is: 


l A + B + Eiskl (2) 








Tis Pi ikl 
where E ij is the error term. Standard linear Regression 
techniques are use to est.mate A and B which assume the error 
is normally distributed, homoscedastic, and with mean zero. 
This in turn means that the distribution of inverted median 
rent is normal and homoscedastic. It is not clear that these 


assumptions are in any sense "reasonable". In fact if medians 


tend to be normal, then the inverse will certainly not be 
normal. Let A and B denote the regression estimates of A and 


B, respectively. The estimates A and B are used to determine 


the estimated median rents, Ri ik through the equation 

Rig es) | soeeeenes (3) 
where Ri ikl and Pi ikl denote the rent and total pay, respectively, 
for paygrade, marital status, number of bedrooms and house type 
[Ref. l:p. 2-60]. Generally, a separate A and Bare determined 


for the enlisted, company grade officers, and field grade 


officer ranks. Thus a separate Rist is computed for each one 








of these three ranks of military personne?. Rik) is then vsed 
to determine owner equivalency median rents. Owner equivalency 
rents are the rent fig:-es assigned to a military member who 
owns and does not rent his or her residence. Costs assigned 
to owners are thought not to be appropriate for use in 
calculatir VHA since intangible benefits accrue to owners and 
not to renters. These owner equivalency median rents are 
weighted according to population percentage of owners and are 
then incorporated into the vector of median rents [Ref. lip. 
2-61]. This new vector of median rents, including both owner 
and renter information, still has four dimensions and must then 
be aggregated to the paygrade and dependency status level. 
{[Ref. l:p. 2-61] After this aggregation, a further smoothing 
process and a denormalization process, the VHA rate multipliers 
are finally computed by dividing by a weighted average of BAQ 
rates [Ref. l:p. 2-63]. These multipliers are checked and if 
an inversion exists, which for example, is when paygrade 02 
receives less VHA than paygrade 01, then additional smoothing 
across paygrades will take place. If inversions still exist 
after the smoothing process has taken place then the entire 
computation of VHA multiplier rates begins again from the point 
where data from close, in geographic terms, MHAs is used [Ref. 
lip. 2-64]. Median rent information is then taken from these 
MHA's and the entire process is run again and again, up to ll 
more times until the rate inversions cease to exist. If after 


1l more times an inversion still exists then the GPX program 


aborts and an inversion in the total population data is 


assumed. [Ref. l:p. 2-64] 


C. PROPOSED PLAN TO UPDATE VHA COMPUTATIONAL PROCESS 

In an effort to get away from the geographical weighting 
of data from close proximity MHA's and in an attempt to 
Simplify the process of computing VHA rates, the Per Diem 
Committee is investigating a new method for computing VHA 
rates. Under this "new" plan, survey data from each MHA is 
placed into various costing bands based on county rental data 
from HUD (Department of Housing and Urban Development) in the 
following manner. From each county in the United States, HUD 
has data for the average rental costs in that county. A 
military housing area is placed into a costing band with other 
military housing areas which have the same average rental 
costs. Therefore if the computed average rental cost for MHA 
A is $260.00 and the median rental cost for MHA B is also 
$260.00, MHA A and MHA B would be placed in the same costing 
band. The computed median rent figure used in this "new" 
process is a single figure found by taking a weighted average 
of rental costs, based on number of bedrooms and house type, 
from the national military population. For example, if 10% of 
the national military population resides in one bedroom 
apartments, the average rental cost of one bedroom apartments 
for that MHA accounts for 10% of the total average rental cost 
figure for that county. Initially the bands will be broken 
into groups of $45.00 increments. The costing bands begin at 
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an average rental cost of $260.00 and continue up to $890.00. 
There is one further ccesting band which accounts for the 
extremely high average rental cost areas such as Alaska which 
are so far above all of the other areas in terms of cost. Thus 
there are a total of 15 different costing bands including the 
"high" costing band. The idea behind grouping military housing 
areas together which have similar average rental costs is to 
provide more data points to reliably predict median rental 
costs per paygrade and dependency status based on the survey 
data. Also using an "outside", other than military, source to 
group the data provides a small means of getting away from the 
military raising its own VHA rates. The “intent of VHA is not 
to reimburse the military member for what he or she pays for 
housing costs but to enable the military person to live in 
adequate housing in whichever area he or she is assigned". 
The costing bands will be used for two major purposes. One 
purpose is, through the use of an appropriate regression model, 
to determine owner equivalency housing costs, and the other 
purpose is to provide housing cost data when there is 
insufficient data in a category to determine a median rent for 
that category. Once this needed data is found it will be 
incorporated back into the MHA data, and then, within the MHA, 
a median rent figure will be computed for each paygrade and 
dependency status. This figure will then be utilized in the 


congressionally mandated equation, (1), local median rent - 80% 


1 From a conversation with Debra Davis, DMDC., June 1989. 
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of national median rental cost, to determine the VHA rates for 
that MHA. Of course these VHA rates are then subject to 


budgetary constraints and congressional approval. 


D. DATA DESCRIPTION 

The data used to determine VHA rates come from data 
collected from military members who participate in the VHA 
Survey. The VHA Survey is taken every other year. The data 
collected from the survey are kept by the Defer-e Manpower Data 
Center which is the repository for all of the data used in the 
VHA calculations. The data used in the VHA process consist of 
raw survey data taken from each military housing area, and 
contain information such as what type of house a military 
member lives in, whether it is a single family home, townhouse, 
apartment, or mobile home, how many bedrooms the house 
contains, whether or not the military member has any dependents 
or whether he or she shares the housing costs with another 
military member, and the paygrade and service of the military 
member. Also contained in the data for each military person 
who participates in the survey is the rental cost, utility 
costs, and maintenance cost of the housing. Other items such 
as social security numbers, whether the member rents or owns 
the housing, and other miscellaneous information are also part 
of each data record for that particular military person. 

The data used in this analysis and taken from the 1989 
survey, consist of the paygrade (E1-09) and dependency status, 
having dependents, single, or single and sharing, of the 
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military member. In addition, the total housing cost for that 


member which consists of the rent plus the maintenance cost 
plus the utility and insurance costs is used. Further 
information on the living space for the individual is also 
needed, such as the number of bedrooms (1-4), and the type of 
living space, detached house, townhouse type, apartment, and 
or mobile home. Additionally, total pay (basic pay + BAQ) has 
to be associated with each military member's dependency status 
and paygrade in order to perform the regression analysis. 
These raw data are edited to reflect only true rental costs not 
ownership costs. Thus one data record used in this analysis 
consists of information regarding paygrade, house type, number 
of bedrooms, dependency status, total housing costs, and total 
pay. 

From this initial set of data one median rent for each 
category of house type, number of bedrooms, marital status, 
and paygrade is then computed. Thus data for an individual 
costing band which might have consisted of over 50,000 records 
is reduced to a data set which contains a maximum of 1104 
records which reflects all of the possible combinations of 
paygrade, house type, number of bedrooms and dependency status. 

SAS was used to extract and edit the raw data, match total 
pay to paygrade and dependency status, and compute a median 
rent figure for each category of paygrade, dependency status, 
number of bedrooms, and house type. (An example of this 


program can be found in Appendix B.) 








E. PROBLEMS WITH THE DATA 

There is one major problem associated with the data used 
in the VHA computational process. The data used does not 
include data from the military members who are in paygrades E5 
and above and who share a residence with another person. These 
data, which might provide further information and might enable 
a more reliable estimate of median rents for a MHA, to be 
computed, are not being used. This is a policy decision. This 
is a major problem in the computation of VHA rates because one 
of the basic reasons for the existence of the "costing band" 
idea and one of the major problems associated with the current 
manner in which VHA rates are calculated, is the sparsity of 
data. This policy decision essentially throws away what could 
be valuable and informative data and is contradictory to the 


purpose of finding "good" estimates of median rents. 


F. PURPOSE OF THESIS 

The main purpose of this thesis will be to test the 
validity of the currently used regression model equation (2). 
The data in its newly proposed format of costing bands will be 
used. If the current regression model is not found to be 
adequate then the second goal of this thesis is to suggest a 
better, more sensible model which will more accurately predict 
total housing costs for each costing band. Thus this thesis 
will basically consist of two different types of analyses and 
will analyze the MHA data from two vantage points. Since there 


is no explanation as to why an inverse of rent is predicted 
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linearly by the inverse of pay (equation 2) a more sensible 
regression model will be examined to explain the relationship 
between total rent and total pay. 

A secondary goal of this thesis will be to test the current 
and any proposed regression models not only with the data that 
is currently assigned to each costing band but also with 
fifteen other costing bands comprising of data from the 
original costing band plus data from the military members who 
are E5 and above who share housing with another person. Thus 
thirty costing bands will be formed and a comparison of the 
regression models using the data from the original costing 
bands and data from the "new" costing bands will be made. This 
is important because it may show that the regression models are 
better able to predict housing costs with the added information 


and this in turn will provide better, more accurate VHA rates. 








II. ANALYSIS PROCEDURES 


A. ORDINARY LEAST SQUARES REGRESSION 
Most of the analysis performed in this thesis employs 
simple linear regression (ordinary least squares) to test the 
various postulated models. 
In ordinary least squares regression, a linear model, 
¥, = B+ BX: + @& (4) 
is used to find the relationship between the X:'s (independent 
variables) and the Yus (dependent variables). The random error 
component is denoted by e: and assumed to be normally 
distributed independent random variables with mean zero and 
constant variance, of. This relationship as described by B. 
and B. is used to further predict or estimate other Y."s. Since 
B, and B. are fixed and unknown, b, and b,, are used to denote 
the estimates of their values [Ref. 2:p. 11]. With the 
utilization of these estimators the least squares regression 
fitted values are described by [Ref. 2:p. 11], 
¥ = b,+ bX. (5) 


The values for b, and b, are determined by minimizing 
n Aa 2 n ) 
s = z e: = aoa © ¢ = B, - B,X:; ) . (6) 


By differentiating this equation first with respect to B, and 
then with respect to B., and then by setting these results 
equal to zero and solving for B, and B., the values for b, and 


b. are found by setting the solution for B, equal to b, and B. 
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equal to b,. (Ref. 2:p. 13] The rationale behind this 
minimization process is to ensure that the predicted ith value 
is as "close" as possible (in Euclidean vertical distance) to 
the actual ith value for all i. If the model (4) is correct 
these estimates have minimum variance among all unbiased 
estimates. [Ref. 2:p.14] Utilizing the method above, the 
value for b, (Ref. 2:p. 14] is 


given by 


and the value for b, {Ref. 2:p. 13] is given by 


- 
| ood 


b, = (8) 


UMS) ness 
~ 
t 


(X; os x)?. 


we. 


The sum of the residuals squared divided by the number of 


observations, n, minus two is given by 


1 (9) 
(n-2) 
and represents the unbiased estimator of the variance about 
the regression a [Ref. 2:p. 21] if the model is correct. If 
a postulated model (i.e., the conditional variance of y given 
x) is the true model then o = as {Ref. 2:p. 23} Thus s' is 
an estimate of o’ if the model is correct. [Ref. 2:p. 23] 
The basic assumptions of ordinary least squares regression 
are: 
1. E(e;) = 0, v(e;) zo, 
2. e; and e; are uncorrelated, Cov(e:, e;)=0. 
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3. e, is a normally distributed random variable with mean 
zero and variance o’. Thus the e,'s are independent. 


4. E(Y|x) = a + bX, the conditional expectation of Y given 
X is linear in X. 


If assumptions 1 and 2 hold then ordinary least squares 
provides the best minimum variance linear unbiased estimates 
of the B, and B). [Ref. 2:p. 87] If all of the above 
assumptions hold then by and by are the maximum likelihood 


2 is an unbiased estimate of o, 


estimates of B, and B, and s 
[Ref. 2:p. 88] 

If the residuals are normally distributed it is then 
possible to use the F and t tests to test the significance of 
the regression and to test the individual null hypotheses that 
B, equals 0 or that By equals 0. If the null hypothesis is not 
rejected and the values for B, and B, are not deemed different 
from zero then, of course, there is no significant linear 
relationship between the independent variables and _ the 
dependent variables. The t test statistic is 

n 
(b--0) {2 (x,-k)7}! 
t= i=l (10) 
s 
and has a student's t distribution with n-2 degrees of freedom. 


{Ref. 2:p. 26] The F test statistic tests the overall 


significance of the regression. The F test statistic is 


F by {E (X - XC = YD) (12) 
s! 
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and has 1 and n-2 degrees of freedom. [Ref. 2:p. 32] 


The R? value measures the "proportion of total variation 
about the mean Y explained by the regression". [Ref. 2:p. 33] 
R’ is the sum of squares due to regression divided by the total 


sum of squares, corrected for the mean Y and is denoted by 
Ree (12) 


Values for R’ fall between 0 and 1. The closer the value of 
R? is to 1 the better the regression equation explains the 
variation of the data about Y. 

The "residuals contain all available information on the way 
in which the fitted model fails to properly explain the 
observed variation in the dependent variable Y" [Ref. 2:p. 34]. 
Thus careful examination of the residuals will provide 
indications as to the adequacy of the proposed model. A 
graphic examination of the residuals may provide an indication 
that the model is systematically deficient. Also utilizing a 
lack of fit test may indicate that the model appears to be 
inadequate. 

The lack of fit test breaks the residual sum of squares 
into the mean square due to lack of fit, MS, and the mean 
square due to pure error, a {Ref. 2:p. 37] The MS, 

2 2 


if the model is correct and o° plus a bias term if 


2 


estimates o 


the model is inadequate. The value for s,° estimates 3. {Ref. 


14 





2:p. 37] The lack of fit test compares the F ratio MS,/s,’ with 
the 100(l-a)% point of an F distribution with (a= n,) and n, 
degrees of freedom where n, equals the number of degrees of 
freedom associated with the residual sum of squares and n, 
equals the number of degrees of freedom associated with the 
pure error sum of squares. If the comparison is significant 
(i.e., the F ratio is greater than the tabled F value) this 
then serves as an indication that. the model is inadequate [Ref. 
2:p. 37]. If the test is not significant (i.e., the F ratio 
value is less than the tabled F value), this is an indication 
that “there appears to be no reason to doubt the adequacy of 
the model and both pure error and lack of fit mean squares can 
be used as estimates of of” (Ref. 2:p. 37] 

By graphically examining the residuals, a scatter plot of 
the e.'s versus the ¥,"s will give an indication as to whether 
or not the assumptions of normality, homoscedasticity and 
linearity of ordinary least squares have been violated. If the 
proposed model is correct, the resulting residuals should 
indicate that these assumptions hold. [Ref. 2:p. 141] If the 
model is correct a plot of the residuals versus the fitted 
values should take the shape of a horizontal band as shown in 
Figure 2.1 below [Ref. 2:p. 145]. I£ the plot of the residuals 
takes the shape of a funnel as shown in Figure 2.2 below [Ref. 
2:p. 146], the variance, 3, is not constant and is increasing 


with x, which indicates the need either for weighted least 


LS 








squares or a transformation on the observations Y; before 


performing a regression analysis. [Ref. 2:p. 147] 





Figure 2.1 Satisfactory Residual Plot 
(Ref. 2:p. 145] 





x 


Figure 2.2 Unsatisfactory Funnel-Shaped Residual Plot 
[Ref. 2:p. 146] 


B. INITIAL MODELS TESTED USING ORDINARY LEAST SQUARES 

REGRESSION 

The first step in this analysis was to test the model 
currently in use, equation (2), to see if it could be used to 
predict median rents for each of the thirty costing bands. 
The model was tested under several different conditions. 
First, the model was run using all of the available data in 
each costing band. Next the data was divided by marital status 
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and within each costing band the model was tested using all of 
the data for those military personnel with dependents and then 
the model was tested using all of the data for those military 
personnel without dependents. The model was tested under 
another condition in which the data was broken down further by 
paygrades into enlisted, paygrades 1-9, company grade officers, 
paygrades 10-19, and field grade officers, paygrades 20-23. 
Thus the model was tested within each costing band according 
to groupings of the data consisting of enlisted personnel, 
company grade officers, and field grade officers Finally the 
current model was tested within each costing band by grouping 
the data by a combination of dependency status and paygrade 
categories. In this case the data in each costing band was 
first broken into groups by dependency status and within each 
dependency group, the data was further broken into categories 
of enlisted, company grade officer and field grade officer. 
For each of the above mentioned conditions in which the 


model was tested, the data was plotted L/ Tia, versus 1/P, the 


Mee 
model was tested using Ordinary Least Squares regression 
procedures, the residuals were plotted versus the fitted values 
of the median rents, Te ayy and the residuals were tested for 
normality. (These results are given in the next chapter.) 
After reviewing the results of the regression procedures, 
the initial model did not seem to adequately describe the 


relationship between total pay and median rental ccsts nor did 


it serve as an adequate predictor of fitted values for median 
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rental costs since the assumptions of least squares regression 
were violated. Evidence of this, includes low R’ values, non- 
normality of the residuals, unequal variance of the data, and 
an indication of significant lack of fit. This, along with 
cross-validation results are explained in detail in the 
analysis portion of this thesis. Therefore a new model was 
postulated. The new model was 

Ti > Pig +Bre (13) 
in which all of the variables have the same meaning as in the 
previous model. The only difference was that the total pay and 
median rental cost vectors were not inverted. This model was 
tested in all of the same conditions as the initial model. In 
other words the model was first tested using all of the data. 
The data was then broken into groups by dependency status and 
the regression was run again. The data was next broken into 
groups by paygrade and ordinary least squares regression was 
used to test the model using this data. Finally the data was 
broken inte groups by a combination of both by paygrade and by 
dependency status and the model was again tested. 

The results of the regression analysis testing this model 
again indicated that a systematic deficiency in the model 
existed; namely that the residuals exhibited a tendency towards 
nonconstant variance and that the residuals were not normally 
distributed. The nonconstant variance is explainable by the 
fact that different medians from different population sizes 
will have different variances. Thus a weighted least squares 


approach was attempted in conjunction with this model. 
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C. WEIGHTED LEAST SQUARES REGRESSION 

If a postulated model has been tested using ordinary least 
Squares procedures and examination of the residuals shows a 
nonconstant variance, a need for some type of transformation 
on Y is necessary. This transformation will change the e's 
so that the assumptions of ordinary least squares regression 
will hold. (Ref. 2:p. 147] Generally a nonconstant variance 
among the residuals indicates that some of the observations are 
"less reliable” than others. “Ref. 2:p. 108] In this case the 


e's are normally distributed with mean 0 and variance o.? 


instead of of. Thus the e;'s have variance of v,0'. To combat 


this nonconstant variance term, v,0', the entire regression 


equation 
Y, = by + bX, + e; (14) 
is multiplied by the weight, ae Thus the regression 
equation becomes 
os erga Oe as a aed (15) 
Then E(e./W,)= 0 and the V(e,A/v;) = E(e,’/v,) = v.o7/v; = 97, 


Thus e//V; ~ N(0,0°). Therefore the assumptions of ordinary 
least squares will now hold and ordinary least squares 
procedures may now be applied to the transformed regression 
equation. 

Evidence of nonconstant variance was seen in the residual 
Plots after OLS regression was applied using the model (13) 


for most of the costing bands. This implies, as stated above, 


that some of the observations were less reliable than others. 








Intuitively this makes sense in this problem since each 
observation represents a median cost and not an individual 
cost. Thus some observations represent the median of 20 or 30 
data points while other observations represent the median of 
only 5 data points. This makes the median of only five data 
points "less reliable" than the median of a data point which 
represents 20 or 30 data points. 

In order to transform the model into one in which the 


\f2 must 


assumptions of ordinary least squares holds a weight V5 
be found. In this case the necessary weight is l/s; where 
1.25 R; 


1.35 Vn, . 


This is the Gaussian-based approximation (Kendall and Stuart, 


Ss. 


i (16) 


1967) of the standard deviation of the median. [Ref. 3:p. 16] 
R; equals the interquartile range for the ith subset of data 


and n; equals the number of data points comprising that median. 


The reason for this is that if x is N (p,o) then the median is 


N(u, [x gs From the normal table, for normal distributions, 
2n 


IQR = 1.350 thus 


S = ' TOR 5 “259s R, an) 


(>) vn 1.35 1.235 Vn; 


Since the variance of e; = 3? and since s is an estimate 


of Oi if we transform the e's into ei/s the variance of e./S; 


should approximate 1. The variance of the transformed e,'s is 


now estimated to be one and is thus approximately constant. 


Accordingly, the predictor will have more neatly constant 











variance. Therefore this assumption of ordinary least squares 
hold and OLS regression procedures are more appropriately 


performed on the transformed model. 


D. ANALYSIS OF COVARIANCE MODEL 
The results of using a weighted least squares approach 
with the transformed model, equation (15), indicated that this 
was more sensible than using ordinary least squares, however, 
another approach also seemed plausible. Analysis of Covariance 
(ANCOVA) was used in which the grand mean rental cost is 
adjusted within each group of paygrade, number of bedrooms and 
house type by the rental cost which is determined by these 
factors. Thus the ANCOVA model would become 
Yigg = XoBy + XipnBin + Siz (18) 
in which the X,B, term is the grand mean, the Xi 5B jh term is the 
total pay for each group of number of bedrooms and house type. 
The Yisg term would represent rental cost for each ith person 
dimensioned by jth type of house and the kth number of bedrooms 
in the house. This model differs from the previous model in 
that instead of using medians of total pay within groups of 
paygrade, house type, bedrooms, and dependency status to 
predict median rent, the model used the total pay of each 
individual person in a costing band and the deviations caused 
by differences in house type and number of bedrooms to predict 
rent. Thus, in this case, total pay becomes the continuous 
variable and house type and number of bedrooms become the 
categorical term. Paygrade and Dependency status were not used 


as class variables in this model since total pay adequately 
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reflected their values. Their inclusion would cause 


collinearity to exist among the variables and the regression 


estimates would then be biased. 


E. CROSS VALIDATION TECHNIQUES 

Since the weighted least squares approach with the model 
(15) and the ANCOVA approach (18) using all the data, not the 
median data, were thought to be the most sensible, a cross 
validation technique was used in each case to test the 
parameter estimates and the models. For the weighted least 
squares model half of the data was used to determine regression 
coefficients and these coefficients were then used with the 
other half of the data to calculate new fitted values. These 
values were then compared to the actual observed values to find 


estimates of slope and intercept. The equation 
(y, - ¥,) (19) 


is the residual sum of squares. These values for sum of the 
squares of the residuals were compared for each half of the 
data within each of the thirty costing bands for the weighted 
least squares model. For the ANCOVA model, no provision in SAS 
was available for the above described cross validation so the 
data for each costing band was randomly divided in half and the 
parameter estimates of the coefficients and its standard error 
for each half of the data were compared (See results in 


Analysis chapter). 


22 





III. ANALYSIS 


A. ANALYSIS OF CURRENT MODEL 

The current model, equation (2), was run using OLS 
regression procedures with the data from the thirty costing 
bands, fifteen of which contained data as specified by the Per 
Diem Committee and fifteen which contained the additional data 
obtained from those military members who are in paygrades E5 
and above and who share their residences. The results of the 
regression analysis indicated that this model was suspicious 
in that it did not adequately fit the data, and would therefore 
perhaps not produce an adequate prediction of median rent based 
on total pay. 

Initially the current model, equation (2), was run using 
all of the available data within each costing band. The data 
was plotted, median rent versus total pay, for each costing 
band. A spread in the variance of the data was seen and in 
some instances a curve was present, indicating a nonlinear, 
instead of linear type of relationship (See Appendix A). The 
regression analysis results as seen in Table 1 (See Appendix 
C) showed that in twenty-three out of twenty-eight cases the 
model had a significant lack of fit. (The data from the other 
two costing bands contain only two data points and regression 
analysis is not valid in these two cases.) The residual plots 
from each of these regressions also exhibited evidence of 
nonconstant variance which was a further indication that the 
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model was inadequate. (These residual plots can be seen in 
Appendix A.) The regression results from the costing bands 
which did not exhibit a significant lack of fit did, however, 
have residuals which had a nonconstant variance and were not 
normally distributed. Also the R’ values in each of these 
cases were extremely low (less than .32) which again served as 
an indication that the model only explained at most a third of 
the variance. 

The data within each of the thirty costing bands was then 
broken into two groups according to dependency status. The 
"zero" group within each costing band contained the data from 


those military members who had dependents, and the "one" group 
contained the data from those military members who claimed no 
dependents. The regression model, equation (2), was run again 
using these new groupings of the data. The results of the 
regression analysis again indicated that this model was 
entirely inappropriate. Although there was not one case of 
significant lack of fit, the residual analysis of the data, as 
seen in Table 2 (Appendix C), from twenty-six out of twenty- 
eight of the costing bands, illustrated that the residuals were 
not normally distributed. The residual plots (Appendix A) 
again show nonconstant variance. Two costing bands, the "zero" 
labeled data from both costing bands 510 and 512, while 
indicating that the residuals were normally distributed and had 
constant variance, not showing significant lack of fit, and 


according to the F test for significance of the regression 
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exhibiting evidence of a significant regression, had low R? 
values of less than .500 which indicates a lot of unexplained 
variance. In this instance, with the data broken into groups 
by dependency status, the model again was inadequate. 

Next the data within each of the thirty costing bands was 
broken into groups according to paygrade. Paygrade 1 consisted 
of the data from military members who are in paygrades El to 
E9. Paygrade 2 consisted of the data from military members who 
are in paygrades W1-W4, O1E-O3E, and 01-03. Paygrade 3 
consisted of the data from military members in paygrades 04- 
O07. Data from paygrades O8 and above are included in the data 
for paygrade 07. The model, equation (2), was again tested 
using this data. With the data from the costing bands broken 
into groups in this manner there were 84 different cases in 
which the model was tested. In fifty out of eighty-four cases, 
as can be seen in Table 3 (Appendix C), a significant lack of 
fit was found. Of those thirty four cases where there was not 
a significant lack of fit, twenty eight of them had residuals 
which were not normally distributed and had residual plots 
which showed evidence of nonconstant variance. The six cases 
which showed no evidence of lack of fit, and which had 
residuals which were normally distributed, namely costing band 
632 paygrade 3, costing band 530 paygrade 2, costing band 590 
paygrade 2, costing band 570 paygrade 3, costing band 650 
paygrade 3, and costing band 510 paygrade 2, all had R? values 


less than .330. Thus once again there was strong evidence that 
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even in this case where the data was broken into groupings 
according to paygrade the model was inadequate. 

To further ensure that the model was tested under all 
appropriate conditions, the data was broken into groups first 
by dependency status and then further broken into groups by 
paygrade. Thus the data from each costing band was broken into 


"zero" or “one™ groups as defined previously. The "zero" or 


one" groups were then broken into further groupings according 
to paygrade. Thus the "zero" group, for example, was broken 
into three further groups, paygrade 1, paygrade 2, and paygrade 
3 also as previously defined. Therefore each of the twenty 
eight costing bands now has two dependency status' and within 
each dependency status three paygrades associated with it. 

Thus the model was tested using 168 different sets of data. 
The results of the regression analysis, using each of these 
different data sets, can be seen in Table 4 (Appendix C). At 
an alpha level of .05 three out of the 168 data sets showed 
Significant lack of fit. Of those data sets which did not show 
a significant lack of fit 105 had residuals which were not 
normally distributed and which had residual plots which 
exhibited nonconstant variance. Of those remaining sixty sets 
of data which show no significant lack of fit and normally 
distributed residuals, nineteen of them did not have 
significant overall regressions according to the F test at an 
alpha level of .05. Of the remaining forty-one data sets which 


did not show significant lack of fit, which had normally 
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distributed residuals and residual plots showing constant 
variance (Appendix A), and which had regressions which were 
significant according to the F test, all had rR? values which 
were less than .440. In fact all but four of these remaining 
data sets had R’ values which were less than .220. Thus this 
analysis indicates once again that the original model was 
woefully inadequate and that in none of the cases where the 
data was broken into groups according to dependency status, or 
by paygrade, or by a combination of both would this model 
adequately predict median rent based on total pay. An adequate 


mode] would be one in which there was no lack of fit, the 


assumptions of Least Squares Regression would hold, and the R? 


values would be high indicating that the model explains the 


variance of the data. 


B. ANALYSIS OF PROPOSED MODEL 

The proposed model, equation (13), was tested using the 
same data from the thirty costing bands as was used to test the 
current model, equation (2). The results of the regression 
analysis indicated that in certain cases the use of this model 
may be more adequate in predicting median rent from total pay; 
however it must be used with caution. 

This model, equation (13), was also tested using the same 
groupings of data as used in testing the current model, 
equation (2). Initially, the model was tested using all of the 
data within each costing band. As in the previous model median 
rent versus total pay was plotted. The plots indicated an 
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increase in variance but indicated a strong linear relation- 
ship. The results of the regression analysis showed that in 
all twenty-eight instances, see Table 5, a significant lack of 
fit was evidenced. Next the data within each costing band was 
broken into groups by dependency status. The data was labeled 
with a zero if the military member had dependents and the data 
was labeled with a one if the military member had no dependents 
or had no dependents and was sharing his or her residence. The 
plots of median rent versus total pay for each costing band 
indicated an even stronger linear relationship than in the 
original plots but they still exhibited evidence of unequal 
aut Stee. The results of the regression analysis, see Table 
6, showed that in eight out of fifty-six cases a significant 
lack of fit was evidenced. Of the remaining forty-eight cases 
twelve of these had residuals which were not normally 
distributed. The residual plots of these data sets showed that 
nonconstant variance was present. The residual plots of the 
thirty-six cases which did not have significant lack of fit, 
which had residuals which were normally distributed, and which 
were significant regressions at the alpha level .05, also 
showed some evidence of nonconstant variance. Also, the R’ 
values were in the .4 to .5 range with the highest a value of 
.55. These R’ values are lower than the ones obtained with the 
use of the Weighted Least Squares model, seen in the next 
section, whose purpose is to reduce or eliminate the 


nonconstant variance of the residuals. Thus prediction was 
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worse for residuals with more variance. See Appendix A. The 
data within each costing band was next broken into groups by 
paygrade. This procedure was the same as the one used in 
testing the current model, paygrade 1 reflected paygrades El- 
E9, paygrade 2 reflected paygrades W1-W4, O1E-03E, and 01-03, 
and paygrade 3 reflected paygrades 04-07 with paygrades 08- 
010 included in paygrade 07. When the data was broken into 
these groups there were many more, fifty-six out of eighty- 
four, see Table 7 (Appendix C), cases of significant lack of 
fit. Also because of few data points within each group, the 
overall regressions in many instances were not significant. 
Finally the data was broken into groups first by dependency 
status and then by paygrade. The results of the regression 
analysis indicated that while there were only eight cases of 
Significant lack of fit, see Table 8 (Appendix C), out of one 
hundred and sixty-eight, thirty had residuals which were not 
normally distributed and because of few data points within each 
group, some of the data sets did not have significant 
regressions, at the .05 alpha level. Of the regressions on the 
data sets which did fulfill all of the criteria the R’ values 
were low. Thus the model best predicted median rents from total 
pay when the data was divided by dependency status, however, 
this model must be viewed as possibly inaccurate since the 
residual plots indicated evidence of nonconstant variance, and 


a better model would predict points in an unbiased fashion. 
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C. ANALYSIS OF WEIGHTED LEAST SQUARES MODEL 

Analysis of the Weighted Least Squares Model, equation 
(15), with ¥, = median rent and Xs total pay for the ith 
group, was conducted in the same manner as that of the current 
model, equation (2), and that of the proposed model, equation 
(13). The only difference here was that initially the data 
were randomly divided into two sections in order to use cross 
validation procedures to compare the sum of the squares of the 
residuals of the first division of data to the sum of the 
squares of the errors of the second division of data in which 
the parameter estimates from the first set of data were used 
to compute the predicted values for the second set of data. 
Thus the Weighted Least Squares model was first tested using 
one half of all of the data available within each costing band, 
next the model was tested by the half of the data which had 
been divided into groups by dependency status, then the model 
was tested by the half of the data which had been broken into 
groups by paygrade within each costing band, and finally the 
model was tested with half of the data which had been broken 
first into groups according to dependency status and then by 
paygrade. 

The results of the regression analysis using half of all 
of the data within each costing band showed (see Table 9, 
Appendix C) that a significant lack of fit existed for each 
costing band. When the data was broken into divisions by 


dependency status the regression analysis results, see Table 
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10 (Appendix C), showed that seventeen out of fifty-six cases 
exhibited significant lack of fit and that nine out of the 
thirty nine remaining cases did not have normally distributed 
residuals. Three out of the remaining thirty cases did not 
have regressions which were significant overall and of the 
remaining twenty seven cases in which all statistical criteria 
were met, the R! values were typically between .44 and .75. 
There was no evidence of nonconstant variance in the residual 
plots and they seemed to appear to have been normally 
distributed in most cases. 

When the data was broken into groups by paygrade, only 
twenty-five out of a possible eighty four cases, see Table ll 
(Appendix C), met all of the criteria of successful regression 
in that they did not have significant lack of fit, their 
residuals were normally distributed, and their regressions were 
significant at the .05 alpha level. The R? values, however, 
ranged from very low to a high of .73. Again the residual 
plots appeared to indicate a fairly normal distribution with 
little evidence of nonconstant variance. 

The results of the regression analysis, when the data was 
broken into groups both according to dependency status and 
paygrade, see Table 12, showed that better than half, 93 out 
of 168, met the criteria for a successful regression and had 
R’ values ranging mostly between .4 and .65. There were 
however, very few data points in some categories, thus these 


results must be viewed with suspicion. The statistics for lack 
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of fit, normality of the residuals, and overall significance 
of the regression all might have been affected by this small 
number of data points. Therefore this model using a weighted 
least squares approach, equation (15), performed best when the 
data within each costing band was divided according to 
dependency status. 

The cross validation technique used here proved to be 
unsuccessful since only the sum of squares of tke residuals 
(SSR) term were compared, see Table 13 (Appendix C), in the 
case where all of the data was used within each costing band. 
The differences between the SSR for the first group of data and 
the data with predicted values found by employing the parameter 
estimates fromthe first set of data for each costing band were 
quite large. This could be due to the lack of fit which was 
found or due to the fact that the second group generally had 
several more data points than the first group. Either of these 
two factors or a combination of both might have accounted for 


these tremendous differences. 


D. ANALYSIS OF THE ANALYSIS OF COVARIANCE MODEL 

The results of the regression analysis on the ANCOVA model 
indicated that this model may be the best model discussed thus 
far for use in predicting rent based on total pay (see Table 
14, Appendix C). All of the regressions were significant and 
had R! values ranging from .42 to .58 with few values above or 
below these numbers. The residuei plots, normal plots, and 
stem and leaf diagrams indicated that the residuals were 
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normally distributed (See Appendix C). The significance levels 
of the normal statistic used to test the normality of the 
residuals, however, did not, in most cases, indicate that the 
residuals were normally distributed. However the residuals 
were fairly symmetric and the sample size was quite large, 
therefore the model should be fairly robust to the lack of 
normal fit. The residual plots showed the fairly typical box- 
like pattern of randomly distributed data. The stem and leaf 
and normal plots supported a fairly good defense for the 
normality of the residuals. 

In the case of several of the costing bands there did not 
appear to be a significant difference in the least squares 
means of the rent pertaining to different house types and 
different number of bedrooms. This was particularly true 
between house types 1 and 2 (single family home and townhouse) 
and also between house types 3 and 4 (apartment or mobile 
homes). In some costing bands there also appeared to be no 
significant difference between the least square means of rent 
predominantly in the case between 3 and 4 bedrooms and less 
predominantly with 1 and 2 numbers of bedrooms. This 
indicates, that, when there is not a significant difference 
between the least squares means between two different types of 
housing or two residences with different numbers of bedrooms, 
either of the parameter estimates of two types of housing or 
number of bedrooms may be used to predict rent. Thus the 


ANCOVA model which predicted rent based on the total pay 
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associated with number of bedrooms and house type may not have 
been completely correct in these cases since the mean amount 
of rent associated with each type of house or number of 
bedrooms may not have been different. 

The cross validation technique used here, since GLM does 
not provide a vehicle to compute the Sum of Squares of the 
Residuals from previously calculated parameter estimates, was 
one in which the data was randomly divided into two sections 
and after the ANCOVA model was run on both sets of data, the 
coefficient of the slope parameter estimate and its standard 
error were compared. A comparison of the slope parameter and 
its standwcd error between the two sections of data from each 
costing band revealed that the model was not at serious fault 
since in both of the sections of the data the slope parameter 
estimates were very close and the standard errors were small 


and similar (See Table 14). 
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IV. CONCLUSIONS AND RECOMMENDATIONS 


The purpose of this thesis was to test and validate the 
current model, equation (2), to see if it could effectively be 
used to predict rent based on total pay from the survey data 
which had been arranged in a newly devised, simplified format. 
If the current model was deemed invalid or suspicious, then the 
second purpose of this thesis, San to propose a better, more 
sensible model which would adequately predict rent based on 
total pay. 

There are two major conclusions from the analysis conta: ned 
in this thesis. The first conclusion is that the current 
model, equation (2), should not be used to predict median rents 
in each paygrade and dependency status when the data is divided 
into costing bands in the manner previously described. This 
conclusion is justified by the results of the regression 
analysis which show that this model is inadequate and may not 
accurately predict median rent. The second conclusion is that 
both the weighted least squares model and the ANCOVA model are 
possible alternative models for use in predicting rent based 
on total pay. They are shown to be at least as reasonable as 
the current model, if not better. The ANCOVA model may be 
preferable for predicting mean rather than a median rent. Also 
the ANCOVA model may be preferable if the model is used to 
determine owner equivalency rents. If a median rent figure 
must be used in the congressionally mandated formula for the 
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computation of VHA the weighted least squares model is 
preferable. 

The secondary purpose of this thesis was to determine if 
the data from military personnel in paygrades E5 and above who 
share housing should be used or discarded since these data had 
been previously discarded on the basis of a policy decision 
without any statistical backing. Curiously enough, there seems 
to be no systematic difference across all of the models 
investigated in relation to the addition of this data. In some 
instances when regression analysis results from the same two 
costing bands, one which contained the additional data and one 
which did not contain the additional data, were compared, lack 
of fit was affected. Also in some cases the significance of 
the regression would be affected, or in some cases the R? 
values would go up or down. Thus there was no instance in 
which, for example, all of the R’ values would go up or all of 
the significance of regression statistics would suddenly 
increase or decrease for a certain model. The important 
consideration here was that the additional data did affect R’ 
values; it did affect the lack of fit, significance value 
statistics, and the normality of residuals. Thus while the 
additional data did not have a systematic effect, it did have 
an effect and this aspect should not go overlooked when a 
decision is made whether or r3t to include these data when VHA 


rates are actually calculated. 
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There are several recommendations for further analysis. 
First, the way in which the data is broken into costing bands 
must be investigated. Perhaps a better method or a different 
dollar figure could be used to divide the data into costing 
bands. If a different method is used and the data contained 
in each costing band is different, analysis of each of the 
regression models discussed in this paper must be redone. If 
the data is put into different costing bands other than the 
ones used in this thesis, the models discussed may be more or 
less accurate predictors of median rent. In either case the 
original data must be investigated and natural breaks in the 
data must be discovered in order to achieve the best placement 
of data into costing bands. A second area which requires 
further analysis concerns the ANCOVA model. The data, before 
testing the ANCOVA model, should be divided into groups either 
by dependency status or by paygrade. A better fit of the 
regression model may be accomplished in either case. Other 
models should also be investigated as possible solutions to the 
problem. Perhaps instead of the weighted least squares, 
another transformation on the data could be devised which may 
provide a better model. Since there is an indication of non- 
normal errors, perhaps GLIM (Generalized Linear Models) could 
be used for more accurate prediction [Ref. 4]. Further 
Analysis and other models should still be investigated as 


possible predictors of median rents for the VHA. 
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APPENDIX A. SCATTER AND RESIDUAL PLOTS 


A. USING DATA SET 540 AS AN EXAMPLE, SCATTER AND RESIDUAL PLOTS 
FOR THE CURRENT MODEL. 
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Figure 1. Data Set 540 1/Median Rent vs. 1/Total Pay. 
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Figure 2. Data Set 540. Residuals vs. Predicted Values. 
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Figure 3. Data Set 540. 
Dependency Status '0O'. : 
1/Median Rent vs. 1/Total Pay. 
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Figure 4. Data Set 540. 
Dependency Status '1'. 
1/Median Rent vs. 1/Total Pay. 
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Figure 5. Data Set 540. 
Dependency Status '0'. 
Residuals vs. Predicted Values. 
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Figure 7. Data Set 540. 
Paygrade ‘'1l'. 
1/Median Rent vs. 1/Total Pay. 
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Figure 8. Data Set 540. 
Paygrade ‘'2'. 
1/Median Rent vs. 1/Total Pay. 
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Figure 9. Data Set 540. 
Paygrade '3'. 
1/Median Rent vs. 1/Total Pay. 
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Figure 10. Data Set 540. 
j Paygrade 'l'. 
Residuals vs. Predicted Values. 
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Figure 11. Data Set 540. 
Paygrade '2'. 
Residuals vs. Predicted Values. 
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Figure 12. Data Set 540. 
Paygrade '3'. 
Residuals vs. Predicted Values. 
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Figure 13. Data Set 540. 
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1/Median Rent vs. 


50 


and Paygrade 
1/Total Pay. 


> 
> doom 


0. ogo80c 2 260875 9 GOOFSC 


i aes 





FLOT SF IMTOSTYITOTP LESEND az . TBS. 3 + 2 5BS. ETC. 





a 
oo 
° 
rey 
o 

>> 








A 
2027 
A 
A 
4 
A 
A a 
a a 
A 
a 
A 
a A a 
5 3 ‘ 
g Sag v A ey 
= 5 a Hy 
a A 
sagois a . 3 a 3 
3 A a 
4 A 
5.2019 +4 . 2 
2 cola -* . 
A a A 
B aA AA A A 
2.0017 A A 
3 3 A A a 
9.0016 * 
a A 
A 
0.005 - a A 
a A 
ow +3 a id 
A A 
o.SGL2 =a a a 
A 
3.9012 + 
9. 90030 0.00033 3.00039 0700062 0700045 3 00008 0.00051 3. 60654 > 56087 3.00060 





ITOTP 


Figure 14. Data Set 540. 
Dependency Status '0' and Paygrade '2'. 
1/Median Rent vs. 1/Total Pay. 
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Figure 15. Data Set 540. 
Dependency Status '0' and Paygrade '3'. 
1/Median Rent vs. 1/Total Pay. 
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Figure 16. Data Set 540. 
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Figure 17. Data Set 540. 
Dependency Status '0' and Paygrade '2'. 
Residuals vs. Predicted Values. 
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Figure 18. Data Set 540. 
Dependency Status '0' and Paygrade '3'. 
Residuals vs. Predicted Values. 
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Figure 19. Data Set 540. 


Dependency Status 'l' 
1/Median Rent vs. 
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Figure 20. Data Set 540. 
Dependency Status ‘1' and Paygrade ‘'2'. 
1/Median Rent vs. 1/Total Pay. 
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Figure 21. Data Set 540. 
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1/Median Rent vs. 
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Data Set 540. 
‘1l' and Paygrade 
Predicted Values. 
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Figure 23. Data Set 540. 


Dependency Status '1l' and Paygrade '2'. 
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Figure 24. Data Set 540. 
Dependency Status '1' and Paygrade ‘'3'. 
Residuals vs. Predicted Values. 
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62 


S A 
A 
A 
B a A 
A AA A 
a 
4 AA AA 
é 
LY S 
AB A 
a a 
A 
8 
A 
+900 «500 $900 


Median Rent vs. 


Total Pay. 


bn ees 





“790 ¢ 


+1000 + 


Figure 26. 





PLOT OF SESLDYMCSTHT 


LEGEND a = 1 185. 8 + 2 BS. ETC. 








ALA A 
B BoA AA 
c AALA H A e a a 
c ABA Aa A 
BA apc 8 Ca A 
“D-BB-AC-4-C--- BB seo. Arce t reste tte e cee eee e eee eens 
aB CA AB aA ad 8B A 
AAA AC ea A > 
B BAAAB OA 2 3 
aA 8 a a A 
AA a as 
A A a 
A 
“G0 ~25 450 «75 $00 525 55 37S 600 625 650 675 700 725 750 


Data Set 540. 


PREDICTED VALUE 


Residuals vs. 





Predicted Values. 





NSHR=C 


@LlT CF MCCSTUTCTE LEGEND. a = . OBS. 


oO 


$5.87 > 


www we 


> 
eo 


> >»wr>>0n >o> 
+O WH De 9D 


Or> > 
> >>> Bsn, 
>> 


B2O) PwW>>>O 
> 


> > w>>>-WuRd- 


>> 


> 
>> FOROS 


5000” 


Figure 27. Data Set 540. 
Dependency Status '0'. 
Median Rent vs. Total Pay. 
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Figure 28. Data Set 540. 
Dependency Status 'l'. 
Median Rent vs. Total Pay. 
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Figure 29. Data Set 540. 
Dependency Status '0O'. 
Residuals vs. Predicted Values. 
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Figure 30. Data Set 540. 
Dependency Status '1'. 
Residuals vs. Predicted Values. 
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Figure 31. Data Set 540. 
Paygrade 'l'. 
Median Rent vs. Total Pay. 
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Figure 32. Data Set 540. 
Paygrade 


Median Rent vs. Total Pay. 
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Figure 33. Data Set 540. 
Paygrade '3'. 
Median Rent vs. Total Pay. 
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Figure 34. Data Set 540. 
Paygrade ‘'l'. 
Residuals vs. Predicted Values. 
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Figure 35. Data Set 540. 
Paygrade '2'. 
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Figure 36. Data Set 540. 
Paygrade '3'. 
Residuals vs. Predicted Values. 
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Figure 37. Data Set 540. 
Dependency Status '0' and Paygrade '1l'. 
Median Rent vs. Total Pay. 
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Figure 38. Data Set 540. 
Dependency Status '0' and Dependency Status ‘'2'. 
Median Rent vs. Total Pay. 


~I 
uw 


PLIT OF MOOSTYTOCP LEGEND. a: 1 CBS. Bo: ¢ cBS, ETc 
30: a 
19 
a 
sd A ‘ 
a A 
350 a * 
A 
& 
7$¢ 
a 
"60 a 
8 
$50 
A 
£96 
N 
a 
s3¢ ‘ ‘ 
190 ‘ 
A 
a 
309 
200 + 
3900" 732007 34003 6007777" 38007 3030 300°" “e000 600° 800° 300° 300 i600" 
TOTP 


NSHR22 PG2} 


Figure 39. Data Set 540. 
Dependency Status '0' and Paygrade 
Median Rent vs. Total Pay. 
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Figure 40. Data Set 540. 
Dependency Status '0' and Paygrade '1l'. 
Residuals vs. Predicted Values. 
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Figure 41. Data Set 540. 
Dependency Status'0' and Paygrade '2'. 
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'O' and Paygrade 
Predicted Values. 
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Figure 43. Data Set 540. 
Dependency Status '1' and Paygrade 'l'. 
Median Rent vs. Total Pay. 
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Figure 44. Data Set 540. 
Dependency Status '1' and Paygrade '2'. 
Median Rent vs. Total Pay. 
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Figure 45. Data Set 540. 
Dependency Status '1' and Paygrade '3'. 
Median Rent vs. Total Pay. 
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Figure 48. Data Set 540. 
Dependency Status '1' and Paygrade '3'. 
Residuals vs. Predicted Values. 
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Figure 49. Data Set 540. Residuals vs. Predicted Values. 
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Figure 50. Data Set 540. 
Dependency Status '0'. 
Median Rent vs. Total Pay. 
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Figure 51. Data Set 546. 
Dependency Status ‘1'. 
Median Rent vs. Total Pay. 
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Figure 52. Data Set 540. 
Dependency Status '0O'. 
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Figure 53. Data Set 540. 
Dependency Status '1'. 
Residuals vs. Predicted Values. 
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Figure 55. Data Set 540. 
Paygrade '2'. 
Median Rent vs. Total Pay. 
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Figure 56. Data Set 540. 
Paygrade '3'. 
Median Rent vs. Total Pay. 
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Figure 57. Data Set 540. 
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Figure 58. Data Set 540. 
Paygrade ‘'2'. 
Residuals vs. Predicted Values. 
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Figure 59. Data Set 540. 
Paygrade '3'. 
Predicted Values. 
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Figure 60. Data Set 540. 
Dependency Status '0' and Paygrade '1'. 
Median Rent vs. Total Pay. 
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Figure 61. Data Set 540. 
Dependency Status '0' and Dependency Status '2'. 
Median Rent vs. Total Pay. 
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Figure 62. Data Set 540. 
Dependency Status '0' and Paygrade '3'. 
Median Rent vs. Total Pay. 
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Figure 63. Data Set 540. 
Dependency Status '0' and Paygrade '1'. 
Residuals vs. Predicted Values. 
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Figure 64. 
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Figure 65. Data Set 540. 
Dependency Status '0' and Paygrade '3'. 
Residuals vs. Predicted Values. 
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Figure 66. Data Set 540. 
Dependency Status '1' and Paygrade '1l'. 
Median Rent vs. Total Pay. 
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Figure 67. Data Set 540. 
Dependency Status '1' and Paygrade '2'. 
Median Rent vs. Total Pay. 
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Figure 68. 
Dependency Status 


Data Set 540. 
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Figure 69. Data Set 540. 
Dependency status ‘1' and Paygrade '1'. 
Residuals vs. Predicted Values. 
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Figure 70. Data Set 540. 
Dependency Status '1' and Paygrade '2'. 
Residuals vs. Predicted Values. 
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Figure 71. Data Set 540. 
Dependency Status '1' and Paygrade '3'. 
Residuals vs. Predicted Values. 








D. USING DATA SET 540 AS AN EXAMPLE, STEM AND LEAF, NORMAL PLOTS, 
AND RESIDUAL PLOTS FOR THE ANCOVA MODEL. 
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Figure 72. Data Set 540. Residuals vs. Predicted Values. 
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Figure 73. Data Set 540. Stem and Leaf and Normal Plots. 
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